On combining frequency warping and spectral shaping in HMM based speech recognition

نویسندگان

  • Alexandros Potamianos
  • Richard C. Rose
چکیده

Frequency warping approaches to speaker normalization have been proposed and evaluated on various speech recognition tasks [1, 2, 3]. These techniques have been found to signi cantly improve performance even for speaker independent recognition from short utterances over the telephone network. In maximum likelihood (ML) based model adaptation a linear transformation is estimated and applied to the model parameters in order to increase the likelihood of the input utterance. The purpose of this paper is to demonstrate that signi cant advantage can be gained by performing frequency warping and ML speaker adaptation in a uni ed framework. A procedure is described which compensates utterances by simultaneously scaling the frequency axis and reshaping the spectral energy contour. This procedure is shown to reduce the error rate in a telephone based connected digit recognition task by 30-40%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Formant-based frequency warping for improving speaker adaptation in HMM TTS

Vocal Tract Length Normalization (VLTN), usually implemented as a frequency warping procedure (e.g. bilinear transformation), has been used successfully to adapt the spectral characteristics to a target speaker in speech recognition. In this study we exploit the same concept of frequency warping but concentrate explicitly on mapping the first four formant frequencies of 5 long vowels from sourc...

متن کامل

تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت

The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...

متن کامل

Automatic speech recognition for children

In this paper, the acoustic and linguistic characteristics of children speech are investigated in the context of automatic speech recognition. Acoustic variability is identi ed as a major hurdle in building high performance ASR applications for children. A simple speaker normalization algorithm combining frequency warping and spectral shaping introduced in [5] is shown to reduce acoustic variab...

متن کامل

Robust speech recognition and feature extraction using HMM2

This paper presents the theoretical basis and preliminary experimental results of a new HMM model, referred to as HMM2, which can be considered as a mixture of HMMs. In this new model, the emission probabilities of the temporal (primary) HMM are estimated through secondary, state specific, HMMs working in the acoustic feature space. Thus, while the primary HMM is performing the usual time warpi...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997